Goto

Collaborating Authors

 media platform


'Misrepresent reality': AI-altered shooting image surfaces in U.S. Senate

The Japan Times

'Misrepresent reality': AI-altered shooting image surfaces in U.S. Senate Backdropped by posters with images of Renee Good and Alex Pretti, the two U.S. citizens recently shot and killed by federal immigration officers, a resident of Minneapolis mans a corner to keep an eye out for ICE agents near a school where some students were recently arrested in Minneapolis, Minnesota, on Thursday. Washington - An AI-manipulated image depicting the moments before immigration agents shot an American nurse spread across across the internet, eventually making its way onto the hallowed floor of the U.S. Senate. Social media platforms are awash with graphic footage from the moment U.S. agents shot and killed 37-year-old intensive care nurse Alex Pretti in Minneapolis, Minnesota -- a moment that sparked nationwide outrage. One frame from the grainy footage was digitally altered using artificial intelligence, according to AI experts. In a time of both misinformation and too much information, quality journalism is more crucial than ever.


Combating Misinformation in the Arab World: Challenges and Opportunities

Communications of the ACM

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. Addressing the Arab world's unique challenges against misinformation and disinformation requires efforts at technical, institutional, and social levels. Misinformation and disinformation are global risks. However, the Arab region is particularly vulnerable due to its geopolitical instabilities, linguistic diversity, and other cultural nuances. Misinformation includes false or misleading content, such as rumors, satire taken as fact, or conspiracy theories, while disinformation is the intentional and targeted spread of such content to deceive or manipulate specific audiences. To limit the spread and influence of misinformation, it is essential to advance research on technological methods for early detection, tracking, and mitigation, while also strengthening media literacy and promoting active citizen participation.


Advancing Hate Speech Detection with Transformers: Insights from the MetaHate

Chapagain, Santosh, Hamdi, Shah Muhammad, Boubrahimi, Soukaina Filali

arXiv.org Artificial Intelligence

Hate speech is a widespread and harmful form of online discourse, encompassing slurs and defamatory posts that can have serious social, psychological, and sometimes physical impacts on targeted individuals and communities. As social media platforms such as X (formerly Twitter), Facebook, Instagram, Reddit, and others continue to facilitate widespread communication, they also become breeding grounds for hate speech, which has increasingly been linked to real-world hate crimes. Addressing this issue requires the development of robust automated methods to detect hate speech in diverse social media environments. Deep learning approaches, such as vanilla recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs), have achieved good results, but are often limited by issues such as long-term dependencies and inefficient parallelization. This study represents the comprehensive exploration of transformer-based models for hate speech detection using the MetaHate dataset--a meta-collection of 36 datasets with 1.2 million social media samples. We evaluate multiple state-of-the-art transformer models, including BERT, RoBERTa, GPT-2, and ELECTRA, with fine-tuned ELECTRA achieving the highest performance (F1 score: 0.8980). We also analyze classification errors, revealing challenges with sarcasm, coded language, and label noise.


SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms

Batra, Arnesh, Kumar, Anushk, Khemani, Jashn, Gumber, Arush, Jain, Arhan, Gupta, Somil

arXiv.org Artificial Intelligence

The rapid advancement of deep generative models has significantly improved the realism of synthetic media, presenting both opportunities and security challenges. While deepfake technology has valuable applications in entertainment and accessibility, it has emerged as a potent vector for misinformation campaigns, particularly on social media. Existing detection frameworks struggle to distinguish between benign and adversarially generated deepfakes engineered to manipulate public perception. To address this challenge, we introduce SocialDF, a curated dataset reflecting real-world deepfake challenges on social media platforms. This dataset encompasses high-fidelity deepfakes sourced from various online ecosystems, ensuring broad coverage of manipulative techniques. We propose a novel LLM-based multi-factor detection approach that combines facial recognition, automated speech transcription, and a multi-agent LLM pipeline to cross-verify audio-visual cues. Our methodology emphasizes robust, multi-modal verification techniques that incorporate linguistic, behavioral, and contextual analysis to effectively discern synthetic media from authentic content.


Opioid Named Entity Recognition (ONER-2025) from Reddit

Ahmad, Muhammad, Farid, Humaira, Ameer, Iqra, Amjad, Maaz, Muzamil, Muhammad, Hamza, Ameer, Jalal, Muhammad, Batyrshin, Ildar, Sidorov, Grigori

arXiv.org Artificial Intelligence

The opioid overdose epidemic remains a critical public health crisis, particularly in the United States, leading to significant mortality and societal costs. Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use. This study leverages Natural Language Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to extract actionable information from these platforms. Our research makes four key contributions. First, we created a unique, manually annotated dataset sourced from Reddit, where users share self-reported experiences of opioid use via different administration routes. This dataset contains 331,285 tokens and includes eight major opioid entity categories. Second, we detail our annotation process and guidelines while discussing the challenges of labeling the ONER-2025 dataset. Third, we analyze key linguistic challenges, including slang, ambiguity, fragmented sentences, and emotionally charged language, in opioid discussions. Fourth, we propose a real-time monitoring system to process streaming data from social media, healthcare records, and emergency services to identify overdose events. Using 5-fold cross-validation in 11 experiments, our system integrates machine learning, deep learning, and transformer-based language models with advanced contextual embeddings to enhance understanding. Our transformer-based models (bert-base-NER and roberta-base) achieved 97% accuracy and F1-score, outperforming baselines by 10.23% (RF=0.88).


Generative AI as Digital Media

Abiri, Gilad

arXiv.org Artificial Intelligence

Generative AI is frequently portrayed as revolutionary or even apocalyptic, prompting calls for novel regulatory approaches. This essay argues that such views are misguided. Instead, generative AI should be understood as an evolutionary step in the broader algorithmic media landscape, alongside search engines and social media. Like these platforms, generative AI centralizes information control, relies on complex algorithms to shape content, and extensively uses user data, thus perpetuating common problems: unchecked corporate power, echo chambers, and weakened traditional gatekeepers. Regulation should therefore share a consistent objective: ensuring media institutions remain trustworthy. Without trust, public discourse risks fragmenting into isolated communities dominated by comforting, tribal beliefs -- a threat intensified by generative AI's capacity to bypass gatekeepers and personalize truth. Current governance frameworks, such as the EU's AI Act and the US Executive Order 14110, emphasize reactive risk mitigation, addressing measurable threats like national security, public health, and algorithmic bias. While effective for novel technological risks, this reactive approach fails to adequately address broader issues of trust and legitimacy inherent to digital media. Proactive regulation fostering transparency, accountability, and public confidence is essential. Viewing generative AI exclusively as revolutionary risks repeating past regulatory failures that left social media and search engines insufficiently regulated. Instead, regulation must proactively shape an algorithmic media environment serving the public good, supporting quality information and robust civic discourse.


Utilizing Social Media Analytics to Detect Trends in Saudi Arabias Evolving Market

Aalijah, Kanwal

arXiv.org Artificial Intelligence

Saudi Arabia faced a swift economic growth and societal transformation under Vision 2030. This offers a unique opportunity to track emerging trends in the region, which will ultimately pave the way for new business and investment possibilities. This paper explores how AI and social media analytics can identify and track trends across sectors such as construction, food and beverage, tourism, technology, and entertainment thereby helping the businesses make informed decisions. By leveraging a tailored AI-driven methodology, we analyzed millions of social media posts each month, classifying discussions and calculating scores to track the trends. The approach not only uncovered the emerging trends but also shows diminishing trends. Our methodology is able to predict the emergence and growth of trends by utilizing social media data. This approach has potential for adaptation in other regions. Ultimately, our findings highlight how ongoing, AI-powered trend analysis can enable more effective, data-informed business and development strategies in an increasingly dynamic environment.


AI Red-Teaming is a Sociotechnical System. Now What?

Gillespie, Tarleton, Shaw, Ryland, Gray, Mary L., Suh, Jina

arXiv.org Artificial Intelligence

Whether tapped directly on the web, or embedded in software suites, search engines, and social media platforms, LLMs are everywhere. When a technology jumps this quickly from theoretical plaything to consumer service, many other elements are also settling in around it, without much forethought: interfaces, policies, business models, labor arrangements, infrastructural assurances, complementary technologies, public claims, advertising campaigns, regulations. Researchers studying the workings and implications of these technologies, across computer science, engineering, the social sciences, humanities, and law, must gear up just as fast to study not just the core technology, but the sociotechnical system taking shape around it[19]. Many of these decisions, arrangements, and infrastructures may turn out to be as consequential for users and the broader public as the core technology itself. But the boisterous promises and debates that surround a new technology can obscure these other essential elements that make technologies always more than the sum of their engineered parts. In this essay, we hope to call upon computer scientists and social scientists alike to pay closer, critical attention to thephenomenonof"red-teaming."


A Big Data-empowered System for Real-time Detection of Regional Discriminatory Comments on Vietnamese Social Media

Huynh, An Nghiep, Do, Thanh Dat, Do, Trong Hop

arXiv.org Artificial Intelligence

Regional discrimination is a persistent social issue in Vietnam. While existing research has explored hate speech in the Vietnamese language, the specific issue of regional discrimination remains under-addressed. Previous studies primarily focused on model development without considering practical system implementation. In this work, we propose a task called Detection of Regional Discriminatory Comments on Vietnamese Social Media, leveraging the power of machine learning and transfer learning models. We have built the ViRDC (Vietnamese Regional Discrimination Comments) dataset, which contains comments from social media platforms, providing a valuable resource for further research and development. Our approach integrates streaming capabilities to process real-time data from social media networks, ensuring the system's scalability and responsiveness. We developed the system on the Apache Spark framework to efficiently handle increasing data inputs during streaming. Our system offers a comprehensive solution for the real-time detection of regional discrimination in Vietnam.


Cyberbullying or just Sarcasm? Unmasking Coordinated Networks on Reddit

Pamecha, Pinky, Shah, Chaitya, Jain, Divyam, Gandhi, Kashish, Bhowmick, Kiran, Narvekar, Meera

arXiv.org Artificial Intelligence

With the rapid growth of social media usage, a common trend has emerged where users often make sarcastic comments on posts. While sarcasm can sometimes be harmless, it can blur the line with cyberbullying, especially when used in negative or harmful contexts. This growing issue has been exacerbated by the anonymity and vast reach of the internet, making cyberbullying a significant concern on platforms like Reddit. Our research focuses on distinguishing cyberbullying from sarcasm, particularly where online language nuances make it difficult to discern harmful intent. This study proposes a framework using natural language processing (NLP) and machine learning to differentiate between the two, addressing the limitations of traditional sentiment analysis in detecting nuanced behaviors. By analyzing a custom dataset scraped from Reddit, we achieved a 95.15% accuracy in distinguishing harmful content from sarcasm. Our findings also reveal that teenagers and minority groups are particularly vulnerable to cyberbullying. Additionally, our research uncovers coordinated graphs of groups involved in cyberbullying, identifying common patterns in their behavior. This research contributes to improving detection capabilities for safer online communities.